software cannot compute the model, causing an error. In a logistic regression model as discussed in

Chapter 18, each time you add a covariate, you increase the overall likelihood of the model. In

Chapter 17, which focuses on ordinary least-squares regression, adding a covariate increases your

sum of squares.

What this means is that you don’t want to add covariates to your model that just increase error and

don’t help with the overall goal of model fit. A good strategy is to try to find the best collection of

covariates that together deal with as much error as possible. For example, think of it like roommates

who share apartment-cleaning duties. It’s best if they split up the apartment and each clean different

parts of it, rather than insisting on cleaning up the same rooms, which would be a waste of time. The

term parsimony refers to trying to include the fewest covariates in your regression model that explain

the most variation in the dependent variable. The modeling approaches discussed in the next section

explain ways to develop such parsimonious models.

Adjusting for confounders

When designing a regression analysis, you first have to decide: Are you doing an exploratory analysis,

or are you doing a hypothesis-driven analysis? If you are doing an exploratory analysis, you do not

have a pre-supposed hypothesis. Instead, your aim is to answer the research question, “What group of

covariates do I need to include as independent variables in my regression to predict the outcome and

get the best model fit?” In this case, you need to select a set of candidate covariates and then come up

with modeling rules to decide which groups of covariates produce the best-fitting model. In each

chapter on regression in this book, we provide methods of comparing models using model-fit statistics.

You would use those to choose your final model for your exploratory analysis. Exploratory analyses

are considered descriptive studies, and are weak study designs (see Chapter 7).

But if you collected your data based on a hypothesis, you are doing a hypothesis-driven analysis.

Epidemiologic studies require hypothesis-driven analyses, where you have already selected your

exposure and outcome, and now you have to fit a regression model predicting the outcome, but

including your exposure and confounders as covariates. You know you need to include the exposure

and the outcome in every model you run. However, you may not know how to decide on which

confounders stay in the model.

Regardless of whether you are doing exploratory or hypothesis-driven modeling, you need to

make rules before you start modeling that describe how you will make decisions about your final

model and during your modeling process. You may make a rule that all the covariates in your final

model must be associated with a p value that is statistically significant at α = 0.05. You can make

other stipulations about the final model, or the process of achieving the final model. What is

important is that you make the modeling rules and write them down before you start modeling.

You then need to choose a modeling approach, which is the approach you will use to determine which

candidate confounders stay in the model with the exposure and which ones are removed. There are

three common approaches in regression modeling (although analysts have their customized

approaches). These approaches don’t have official names, but we will use terms that are commonly

used. They are: forward stepwise, backward elimination, and stepwise selection.